2,550 research outputs found

    Transductive Learning with String Kernels for Cross-Domain Text Classification

    Full text link
    For many text classification tasks, there is a major problem posed by the lack of labeled data in a target domain. Although classifiers for a target domain can be trained on labeled text data from a related source domain, the accuracy of such classifiers is usually lower in the cross-domain setting. Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as native language identification or automatic essay scoring. Moreover, classifiers based on string kernels have been found to be robust to the distribution gap between different domains. In this paper, we formally describe an algorithm composed of two simple yet effective transductive learning approaches to further improve the results of string kernels in cross-domain settings. By adapting string kernels to the test set without using the ground-truth test labels, we report significantly better accuracy rates in cross-domain English polarity classification.Comment: Accepted at ICONIP 2018. arXiv admin note: substantial text overlap with arXiv:1808.0840

    Language variety identification using distributed representations of words and documents

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-24027-5_3In this work we focus on the use of distributed representations of words and documents using the continuous Skip-gram model. We compare this model with three recent approaches: Information Gain Word-Patterns, TF-IDF graphs and Emotion-labeled Graphs, in addition to several baselines. We evaluate the models introducing the Hispablogs dataset, a new collection of Spanish blogs from five different countries: Argentina, Chile, Mexico, Peru and Spain. Experimental results show state-of-the-art performance in language variety identification.This research has been carried out within the framework of the European Commis-sion WIQ-EI IRSES (no. 269180) and DIANA - Finding Hidden Knowledge in Texts (TIN2012-38603-C02) projects. The work of the second author was partially funded by Autoritas Consulting SA and by Spanish the Ministry of Economics by means of a ECOPORTUNITY IPT-2012-1220-430000 grant.Franco Salvador, M.; Rangel, F.; Rosso, P.; Taulé, M.; Martí, MA. (2015). Language variety identification using distributed representations of words and documents. En Experimental IR Meets Multilinguality, Multimodality, and Interaction: 6th International Conference of the CLEF Association, CLEF'15, Toulouse, France, September 8-11, 2015, Proceedings. Springer International Publishing. 28-40. https://doi.org/10.1007/978-3-319-24027-5_3S2840Barto, A.G.: Reinforcement learning: An introduction. MIT press (1998)Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. The Journal of Machine Learning Research 3, 1137–1155 (2003)Dumais, S.T.: Latent semantic analysis. Annual Review of Information Science and Technology 38(1), 188–230 (2004)Gutmann, M.U., Hyvärinen, A.: Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. The Journal of Machine Learning Research 13(1), 307–361 (2012)Hinton, G.E., McClelland, J.L., Rumelhart, D.E.: Distributed representations. In: Rumelhart, D.E., McClelland, J.L., (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press (1986)Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the International Conference on Empirical Methods in Natural Language Processing (2014)Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (2014)Levin, B.: English verb classes and alternations. University of Chicago Press, Chicago (1993)Maier, W., Gómez-Rodríguez, C.: Language variety identification in Spanish tweets. In: Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants, pp. 25–35. Association for Computational Linguistics, Doha, Qatar, October 2014. http://emnlp2014.org/workshops/LT4CloseLang/call.htmlMartí, M.A., Bertran, M., Taulé, M., Salamó, M.: Distributional approach based on syntactic dependencies for discovering constructions. Computational Linguistics (2015, under review)Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at International Conference on Learning Representations (2013)Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, September 26–30, 2010Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119 (2013)Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426 (2012)Mohammad, S.M., Yang, T.: Tracking sentiment in mail: how gender differ on emotional axes. In: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (2011)Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: Proceedings of the International Workshop on Artificial Intelligence and Statistics, pp. 246–252. Citeseer (2005)Pennebaker, J.W.: The secret life of pronouns: What our words say about us. Bloomsbury Press (2011)Rangel, F., Rosso, P.: On the impact of emotions on author profiling. Information Processing & Management, Special Issue on Emotion and Sentiment in Social and Expressive Media (2015, in press)Rangel, F., Rosso, P., Chugur, I., Potthast, M., Trenkmann, M., Stein, B., Verhoeven, B., Daelemans, W.: Overview of the 2nd author profiling task at pan 2014. In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) CLEF 2014 Labs and Workshops, Notebook Papers. CEUR-WS.org, vol. 1180 (2014)Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at pan 2013. In: Forner P., Navigli R., Tufis, D. (eds.) Notebook Papers of CLEF 2013 LABs and Workshops. CEUR-WS.org, vol. 1179 (2013)Sadat, F., Kazemi, F., Farzindar, A.: Automatic identification of arabic language varieties and dialects in social media. In: Proceeding of the 1st International Workshop on Social Media Retrieval and Analysis SoMeRa (2014)Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)Sidorov, G., Miranda-Jimnez, S., Viveros-Jimnez, F., Gelbukh, F., Castro-Snchez, N., Velsquez, F., Daz-Rangel, I., Surez-Guerra, S., Trevio, A., Gordon-Miranda, J.: Empirical study of opinion mining in spanish tweets. In: 11th Mexican International Conference on Artificial Intelligence, MICAI, pp. 1–4 (2012)Zampieri, M., Gebrekidan-Gebre, B.: Automatic identification of language varieties: the case of portuguese. In: Proceedings of the Conference on Natural Language Processing (2012

    Knowledge Graphs as Context Models: Improving the Detection of Cross-Language Plagiarism with Paraphrasing

    Full text link
    Cross-language plagiarism detection attempts to identify and extract automatically plagiarism among documents in different languages. Plagiarized fragments can be translated verbatim copies or may alter their structure to hide the copying, which is known as paraphrasing and is more difficult to detect. In order to improve the paraphrasing detection, we use a knowledge graph-based approach to obtain and compare context models of document fragments in different languages. Experimental results in German-English and Spanish-English cross-language plagiarism detection indicate that our knowledge graph-based approach offers a better performance compared to other state-of-the-art models.The research has been carried out in the framework of the European Commission WIQ-EIIRSES (no. 269180) and DIANA-APPLICATIONS - Finding Hidden Knowledge in Texts:Applications (TIN2012-38603-C02-01) projects as well as the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Franco-Salvador, M.; Gupta, P.; Rosso, P. (2013). Knowledge Graphs as Context Models: Improving the Detection of Cross-Language Plagiarism with Paraphrasing. En Bridging Between Information Retrieval and Databases: PROMISE Winter School 2013, Bressanone, Italy, February 4-8, 2013. Revised Tutorial Lectures. Springer Verlag (Germany). 227-236. https://doi.org/10.1007/978-3-642-54798-0_12S227236Barrón-Cedeño, A., Vila, M., Martí, M., Rosso, P.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Computational Linguistics 39(4) (2013)Barrón-Cedeño, A.: On the mono- and cross-language detection of text re-use and plagiarism. Ph.D. thesis, Universitat Politènica de València (2012)Barrón-Cedeño, A., Rosso, P., Pinto, D., Juan, A.: On cross-lingual plagiarism analysis using a statistical model. In: Proc. of the ECAI 2008 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008 (2008)Franco-Salvador, M., Gupta, P., Rosso, P.: Cross-language plagiarism detection using BabelNet’s statistical dictionary. Computación y Sistemas, Revista Iberoamericana de Computación 16(4), 383–390 (2012)Franco-Salvador, M., Gupta, P., Rosso, P.: Cross-language plagiarism detection using a multilingual semantic network. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 710–713. Springer, Heidelberg (2013)Franco-Salvador, M., Gupta, P., Rosso, P.: Graph-based similarity analysis: a new approach to cross-language plagiarism detection. Journal of the Spanish Society of Natural Language Processing (Sociedad Espaola de Procesamiento del Languaje Natural) (50) (2013)Montes-y-Gómez, M., Gelbukh, A., López-López, A., Baeza-Yates, R.: Flexible comparison of conceptual graphs. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 102–111. Springer, Heidelberg (2001)Gupta, P., Barrón-Cedeño, A., Rosso, P.: Cross-language high similarity search using a conceptual thesaurus. In: Catarci, T., Forner, P., Hiemstra, D., Peñas, A., Santucci, G. (eds.) CLEF 2012. LNCS, vol. 7488, pp. 67–75. Springer, Heidelberg (2012)Mcnamee, P., Mayfield, J.: Character n-gram tokenization for European language text retrieval. Information Retrieval 7(1), 73–97 (2004)Miller, G.A., Leacock, C., Tengi, R., Bunker, R.T.: A semantic concordance. In: Proceedings of the Workshop on Human Language Technology, HLT 1993, pp. 303–308. Association for Computational Linguistics, Stroudsburg (1993)Navigli, R., Ponzetto, S.P.: BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193, 217–250 (2012)Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: An evaluation framework for plagiarism detection. In: Proc. of the 23rd Int. Conf. on Computational Linguistics, COLING 2010, Beijing, China, pp. 997–1005 (2010)Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-language plagiarism detection. Language Resources and Evaluation, Special Issue on Plagiarism and Authorship Analysis 45(1), 45–62 (2011)Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd int. competition on plagiarism detection. In: CLEF (Notebook Papers/Labs/Workshop) (2011)Potthast, M., Gollub, T., Hagen, M., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., et al.: Overview of the 4th international competition on plagiarism detection. In: CLEF (Online Working Notes/Labs/Workshop) (2012)Pouliquen, B., Steinberger, R., Ignat, C.: Automatic linking of similar texts across languages. In: Proc. Recent Advances in Natural Language Processing III, RANLP 2003, pp. 307–316 (2003)Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proc. Int. Conf. on New Methods in Language Processing (1994)Stein, B., zu Eissen, S.M., Potthast, M.: Strategies for retrieving plagiarized documents. In: Proc. of the 30th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 825–826. ACM (2007)Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The jrc-acquis: A multilingual aligned parallel corpus with +20 languages. In: Proc. 5th Int. Conf. on Language Resources and Evaluation, LREC 2006 (2006)Vossen, P.: Eurowordnet: A multilingual database of autonomous and language-specific wordnets connected via an inter-lingual index. Proc. Int. Journal of Lexicography 17 (2004

    Effect of carbon nanotubes on methane production in pure cultures of methanogens and in a syntrophic co-culture

    Get PDF
    ICBM-3 - 3rd International Conference on Biogas MicrobiologyConductive materials have been reported to enhance methane production by anaerobic microbial communities from a wide diversity of substrates 1 . The mechanisms involved are far from being fully understood. Many studies suggest that these materials facilitate direct interspecies electron transfer (DIET) between electrogenic bacteria and methanogens and that this mechanism is even dominant over interspecies hydrogen and formate transfer 2,3. The effect of conductive materials in pure cultures of methanogens or in co-cultures of typical fatty acid-degrading syntrophs with methanogenic partners was never studied. In this work, the effect of carbon nanotubes (CNT) on the activity of pure cultures of Methanobacterium formicicum, Methanospirillum hungatei, Methanosarcina mazei and Methanosaeta concilii, and in the co-culture of Syntrophomonas wolfei and Methanospirillum hungatei was evaluated. The results showed that CNT affect methane production by methanogens. Initial methane production rate (MPR) increased 17 and 6 times when M. formicicum and M. hungatei were incubated with 5g·L-1 CNT, respectively. M. mazei and M. concilii‘ activities were higher when exposed to CNT concentrations of 0.1 to 1g·L-1 , but lower with 5g·L-1 . Increasing CNT concentrations resulted in more negative redox potentials, which correlated with the increased methanogenic activity. Remarkably, in the absence of a reducing agent, but in the presence of CNT, the MPR was higher than in incubations with reducing agent, while no growth was observed without reducing agent and without CNT. MPR from butyrate increased 1.5 fold in the presence of CNT (5g.L-1 ), showing a positive effect of CNT on the syntrophic coculture. Indications of DIET by the presence CNT were not obtained. Rather, CNT directly affects the activity of methanogens, which creates new opportunities to improve methane production from waste and wastewater in anaerobic digesters.info:eu-repo/semantics/publishedVersio

    Performance of parallel FDTD method for shared- and distributed-memory architectures: Application tobioelectromagnetics

    Get PDF
    This work provides an in-depth computational performance study of the parallel finite-difference time-domain (FDTD) method. The parallelization is done at various levels including: shared- (OpenMP) and distributed- (MPI) memory paradigms and vectorization on three different architectures: Intel's Knights Landing, Skylake and ARM's Cavium ThunderX2. This study contributes to prove, in a systematic manner, the well-established claim within the Computational Electromagnetic community, that the main factor limiting FDTD performance, in realistic problems, is the memory bandwidth. Consequently a memory bandwidth threshold can be assessed depending on the problem size in order to attain optimal performance. Finally, the results of this study have been used to optimize the workload balancing of simulation of a bioelectromagnetic problem consisting in the exposure of a human model to a reverberation chamber-like environment

    Revista de Vertebrados de la Estación Biológica de Doñana

    Get PDF
    Materiales para una «Herpetofauna Balearica 5. Las salamanquesas y tortugas del archipiélago de CabreraEcología de una población insular mediterránea del Eslizón ibérico, Chalcides bedriagai (Sauria Scincidae).Ecología alimenticia del águila imperial ibérica (Aquila adalberti) en el Coto Doñana durante la crianza de los pollosDatos sobre la dieta invernal del colirrojo tizón (Phoenicurus ochruros) en encinares de Andalucía occidentalSobre infecciones estafilocócicas en el Aguila imperial ibérica (Aquila adalberti Brehm)Breves notas sobre el Sapo partero ibérico (Alytes cisternasii Boscá)Sobre la presencia de Hyla arborea en la provincia de BadajozAlgunas presas de Elaphe scalaris.Observaciones de Tarentola maurítanica en nido de Hirundo dauricaObservación de una culebra viperina Natrix maura en agua marinaPrimera cita de la CollaIba yebélica (Oenanthe leucopyga) en la Península ibéricaObservaciones de Phoenicopterus ruber en la Ría de Vigo (PontevedraDatos sobre el Myotis emarginatus en la Península ibérica.Peer reviewe

    Mental health of migrants with pre-migration exposure to armed conflict: a systematic review and meta-analysis.

    Get PDF
    BACKGROUND Exposure to armed conflict has been associated with negative mental health consequences. We aimed to estimate the prevalence of generalised anxiety disorder, major depressive disorder, and post-traumatic stress disorder among migrants exposed to armed conflict. METHODS In this systematic review and meta-analysis, we searched online databases (Cochrane Library, Embase, LILACS, PsycInfo [via Ovid], PubMed, and Web of Science Core Collection) for relevant observational studies published between Jan 1, 1994, and June 28, 2021. We included studies that used standardised psychiatric interviews to assess generalised anxiety disorder, major depressive disorder, or post-traumatic stress disorder among migrants (refugees or internally displaced persons; aged ≥18 years) with pre-migration exposure to armed conflict. We excluded studies in which exposure to armed conflict could not be ascertained, studies that included a clinical population or people with chronic diseases that can trigger the onset of mental disease, and studies published before 1994. We used a random effects model to estimate each mental health disorder's pooled prevalence and random effects meta-regression to assess sources of heterogeneity. Two independent reviewers assessed the risk of bias for each study using the Joanna Briggs Institute Checklist for Prevalence Studies. The protocol was registered with PROSPERO, CRD42020209251. FINDINGS Of the 13 935 studies identified, 34 met our inclusion criteria; these studies accounted for 15 549 migrants. We estimated a prevalence of current post-traumatic stress disorder of 31% (95% CI 23-40); prevalence of current major depressive disorder of 25% (17-34); and prevalence of generalised anxiety disorder of 14% (5-35). Younger age was associated with a higher prevalence of current post-traumatic stress disorder (odds ratio 0·95 [95% CI 0·90-0·99]), lifetime post-traumatic stress disorder (0·88 [0·83-0·92]), and current generalised anxiety disorder (0·87 [0·78-0·97]). A longer time since displacement was associated with a lower lifetime prevalence of post-traumatic stress disorder (0·88 [0·81-0·95]) and major depressive disorder (0·81 [0·77-0·86]). Migrating to a middle-income (8·09 [3·06-21·40]) or low-income (39·29 [11·96-129·70]) country was associated with increased prevalence of generalised anxiety disorder. INTERPRETATION Migrants who are exposed to armed conflict are at high risk of mental health disorders. The mental health-care needs of migrants should be assessed soon after resettlement, and adequate care should be provided, with particular attention paid to young adults. FUNDING Marie Skłodowska-Curie Actions (Horizon 2020-COFUND), MinCiencias (Colombia), and Swiss National Science Foundation

    Detecting Machine-obfuscated Plagiarism

    Full text link
    Related dataset is at https://doi.org/10.7302/bewj-qx93 and also listed in the dc.relation field of the full item record.Research on academic integrity has identified online paraphrasing tools as a severe threat to the effectiveness of plagiarism detection systems. To enable the automated identification of machine-paraphrased text, we make three contributions. First, we evaluate the effectiveness of six prominent word embedding models in combination with five classifiers for distinguishing human-written from machine-paraphrased text. The best performing classification approach achieves an accuracy of 99.0% for documents and 83.4% for paragraphs. Second, we show that the best approach outperforms human experts and established plagiarism detection systems for these classification tasks. Third, we provide a Web application that uses the best performing classification approach to indicate whether a text underwent machine-paraphrasing. The data and code of our study are openly available.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/152346/1/Foltynek2020_Paraphrase_Detection.pdfDescription of Foltynek2020_Paraphrase_Detection.pdf : Foltynek2020_Paraphrase_Detectio

    Quantum walks: a comprehensive review

    Full text link
    Quantum walks, the quantum mechanical counterpart of classical random walks, is an advanced tool for building quantum algorithms that has been recently shown to constitute a universal model of quantum computation. Quantum walks is now a solid field of research of quantum computation full of exciting open problems for physicists, computer scientists, mathematicians and engineers. In this paper we review theoretical advances on the foundations of both discrete- and continuous-time quantum walks, together with the role that randomness plays in quantum walks, the connections between the mathematical models of coined discrete quantum walks and continuous quantum walks, the quantumness of quantum walks, a summary of papers published on discrete quantum walks and entanglement as well as a succinct review of experimental proposals and realizations of discrete-time quantum walks. Furthermore, we have reviewed several algorithms based on both discrete- and continuous-time quantum walks as well as a most important result: the computational universality of both continuous- and discrete- time quantum walks.Comment: Paper accepted for publication in Quantum Information Processing Journa

    Arterial line pressure control enhanced extracorporeal blood flow prescription in hemodialysis patients

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In hemodialysis, extracorporeal blood flow (Qb) recommendation is 300–500 mL/min. To achieve the best Qb, we based our prescription on dynamic arterial line pressure (DALP).</p> <p>Methods</p> <p>This prospective study included 72 patients with catheter Group 1 (G1), 1877 treatments and 35 arterio-venous (AV) fistulae Group 2 (G2), 1868 treatments. The dialysis staff was trained to prescribe Qb sufficient to obtain DALP between -200 to -250 mmHg. We measured ionic clearance (IK: mL/min), access recirculation, DALP (mmHg) and Qb (mL/min). Six prescription zones were identified: from an optimal A zone (Qb > 400, DALP -200 to -250) to zones with lower Qb E (Qb < 300, DALP -200 to -250) and F (Qb < 300, DALP > -199).</p> <p>Results</p> <p>Treatments distribution in A was 695 (37%) in G1 vs. 704 (37.7%) in G2 (<it>P </it>= 0.7). In B 150 (8%) in G1 vs. 458 (24.5%) in G2 (<it>P </it>< 0.0001). Recirculation in A was 10.0% (Inter quartile rank, IQR 6.5, 14.2) in G1 vs. 9.8% (IQR 7.5, 14.1) in G2 (<it>P </it>= 0.62). IK in A was 214 ± 34 (G1) vs. 213 ± 35 (G2) (<it>P </it>= 0.65). IK Anova between G2 zones was: A vs. C and D (<it>P </it>< 0.000001). Staff prescription adherence was 81.3% (G1) vs. 84.1% (G2) (<it>P </it>= 0.02).</p> <p>Conclusion</p> <p>In conclusion, an optimal Qb can de prescribed with DALP of -200 mmHg. Staff adherence to DLAP treatment prescription could be reached up to 81.3% in catheters and 84.1% in AV fistulae.</p
    corecore